Phylogenomics of trans-Andean tetras of the genus Hyphessobrycon Durbin 1908 (Stethaprioninae: Characidae) and colonization patterns of Middle America

Hyphessobrycon is one of the most species rich and widely distributed genera in the family Characidae, with more than 160 species ranging from Veracruz, Mexico to Mar Chiquita Lagoon in Buenos Aires, Argentina. The majority of Hyphessobrycon diversity shows a cis-Andean distribution; only nine species are trans-Andean including H. compressus (Meek 1908). It is well established that Hyphessobrycon is not monophyletic but it has been suggested that natural groups can be identified within the larger Hyphessobrycon species group. In this study, we tested the monophyly of trans-Andean species of Hyphessobrycon and investigated the placement of H. compressus. We inferred the first phylogenomic hypothesis of trans-Andean Hyphessobrycon that includes nearly complete taxonomic sampling (eight of nine valid species) using ultraconserved elements (UCEs). We analyzed 75% (1682 UCEs), 90% (1258 UCEs), and 95% (838 UCEs) complete data matrices, and inferred phylogenomic hypotheses under concatenation and coalescent approaches. In all cases, we recovered the monophyly of trans-Andean Hyphessobrycon inclusive of H. compressus, strong support for three species groups, and evidence of cryptic diversity within the widespread H. compressus and H. condotensis. We used our phylogenomic hypothesis to investigate the biogeographic history of Hyphessobrycon in Middle America. Our ancestral range estimation analysis suggests a single event of cis- to trans-Andean colonization followed by stepwise colonization from the Pacific slope of northwestern South America (Chocó block) to northern Middle America (Maya block). Our work supports the recognition of the trans-Andean species as Hyphessobrycon sensu stricto and provides an evolutionary template to examine morphological characters that will allow us to better understand the diversity of Hyphessobrycon in Middle America.


Introduction
The Neotropics are home to the world's most diverse assemblage of freshwater fishes, with more than 6200 described species occupying a wide array of aquatic habitats spanning from the Central Mexican Plateau in North America to Tierra del Fuego in South America [1,2]. The species diversity of this fauna is dominated by the ostariophysan lineages Characiformes (tetras and their relatives), Siluriformes (catfishes), and Gymnotiformes (electric knifefishes), and by the acanthomorph lineages Cichlidae (cichlids) and Cyprinodontiformes (livebearers, killifishes, and their relatives) [2]. Given their high diversity and endemicity, Neotropical freshwater fishes provide exemplar clades to explore the factors that promote the generation of exceptional biodiversity within the region. Such investigations rely on the availability of robust phylogenetic hypotheses for clades of interest, but phylogenetic inference has often been challenged by the rapid diversification experienced by many of these lineages. While efforts to elucidate the evolutionary history of Neotropical freshwater fishes have vastly benefited from the application of phylogenomic methods [e.g., [3][4][5][6][7][8][9][10][11], there remains considerable uncertainty about phylogenetic relationships and species delimitations, particularly at recent time scales.
Previous systematic work based on both morphological and molecular data has shown that Hyphessobrycon is not monophyletic [14,17,31]. Instead, species currently classified within Hyphessobrycon belong to multiple different characin lineages, highlighting the need for a phylogenetically-informed taxonomic revision of the genus as currently described [e.g., 12,14,26,32]. Furthermore, the phylogenetic placement of H. compressus (Meek, 1904) [27], the type species and most northernly distributed species of Hyphessobrycon, is contentious. Two alternative hypotheses regarding the evolutionary relationships of H. compressus have been proposed on the basis of morphological characters and coloration patterns: A) a close relationship to cis-Andean species of the 'rosy tetra' clade [31,32] or B) a close relationship with all trans-Andean Hyphessobrycon [33]. At present, the lack of a robust hypothesis of relationships of trans-Andean Hyphessobrycon hinders our understanding of the systematics (i.e., which species belong to Hyphessobrycon sensu stricto) and the evolutionary history (e.g., biogeographic history) of this group in the northern Neotropics.
Here, we used ultraconserved elements (UCEs) [34], to infer the phylogenomic relationships among trans-Andean species of Hyphessobrycon under concatenation and coalescent approaches. Specifically, we tested whether trans-Andean species of Hyphessobrycon are monophyletic, investigated the phylogenetic placement of the type species H. compressus, and explored if gene tree heterogeneity present in our dataset provided support for alternative

Bioinformatics
We used the software PHYLUCE v1.7-1 [41] to perform bioinformatic processing of sequencing reads, identification and extraction of UCE loci from reads, and alignment of UCE loci across samples for downstream phylogenomic inference. Adapter contamination and lowquality reads were removed from our newly generated demultiplexed raw reads using Illumiprocessor [42], a wrapper for Trimmomatic [43] implemented in the PHYLUCE pipeline using the command illuminoprocessor. For the Melo data we used fastp [44] implemented in YAP v0.2.1 (Yet-Another-Pipeline) [45] to perform the cleaning of this dataset. The input file for fastp was generated using the command new -d and then we used the command qc -i to clean the raw data. Finally, we combined all the clean reads (newly generated + Melo data; FASTA files) into a single folder to continue processing the data.
We assembled the data for each individual sample into contigs using SPAdes [46,47] within PHYLUCE using the program phyluce_assembly_assemblo_spades. To identify and extract UCEs we ran the assembled contigs for all the samples against the ostariophysan probe set (2,708 UCEs) [40] using the program phyluce_assembly_match_contigs_to_probes. Then we generated a list of all loci for all samples using the program phyluce_assembly_get_match_counts that was used to extract the FASTA data using the program phyluce_get_fastas_from_-match_counts. FASTA files were exploded using the program phyluce_assembly_explode_get_fastas_file to obtain summary statistics for each sample assembled. We aligned the UCEs using the MAFFT v.7 aligner [48], first with edge trimming followed by internal trimming using the program phyluce_seqcap_align. Subsequently, we trimmed locus alignments using Gblocks [49] using the program phyluce_align_get_gblock-s_trimmed_alignments_from_untrimmed. Trimmed alignments were cleaned to remove the locus name and just keep the sample name using the program phyluce_align_remove_locus_-name_from_files. Finally, we generated three data sets that varied in the level of completeness of each alignment (viz, 75, 90, and 95%) using phyluce_align_get_only_loci_with_min_taxa and calculated summary statistics for each alignment using SEGUL [50]. We used these datasets in downstream phylogenomic analyses. All the data processing steps described above follow standard protocols (https://phyluce.readthedocs.io/) and were implemented on the Louisiana State University High Performance Computing Cluster SuperMIC.

Phylogenomic inference
Phylogenomic hypotheses based on the concatenated alignments were inferred for all three datasets (i.e., 75, 90, and 95%) under a maximum likelihood approach implemented in IQ-TREE 2 [51]. Model selection for each UCE locus was performed using ModelFinder [52] implemented in IQ-TREE 2 followed by tree inference in the same run using the option -m TEST. Node support was evaluated using both Ultrafast boostrap (UFBoot2) [52,53] and the SH-like approximate ratio test (SH-aLRT) [54] with 1000 replicates each. Finally, we estimated gene trees and their support values (UFBoot2; 1000 replicates) for all loci for the three datasets (75,90, and 95%) in IQ-TREE 2 using the options -s and -bb.
We inferred a species tree using ASTRAL-III, a two-step method [i.e., inference of gene trees (step 1) and species tree (step 2)] that is consistent with the multispecies coalescent model [55]. ASTRAL-III uses gene trees as input for the inference of the species tree. To minimize the effects of gene tree estimation error, see Simmons and Gatesy [56], we collapsed nodes with values < = 50 bootstrap support (i.e., UFBoot2) in all gene trees in Newick utilities [57] using the option nw_ed b < = 50 prior to inferring the species tree. These "collapsed" gene trees were used as input to infer the species tree and to calculate local posterior probabilities (LPP) in ASTRAL-III. In order to have one species or lineage per tip represented in our inferred species tree, we mapped terminal branches of the gene trees representing the same species or lineage into the same taxon (i.e., terminal branch of the species tree) using the option -a in Astral-III.
Recently, it has been highlighted that standard measures of support in phylogenomic inference (i.e., bootstrap support and Bayesian posterior probabilities) have some limitations [58,59]. An expectation is that these measures of support will reach their maximum values even in cases of high gene tree discordance (see Thomson and Brown [59]) possibly leading to high support in topological hypotheses that might not accurately reflect the evolutionary history of the group of interest. We utilized gene concordance factors (gCF) [51,60] and normalized quartet scores (NQS) [61] as alternative measures of support of relationships in our concatenated and coalescent phylogenomic hypotheses respectively. We used the collapsed gene trees as input to calculate gene concordance factors (gCF), the frequency of the two alternative topologies (i.e., gene discordance factors; gDF1 and gDF2), and the proportion of gene trees discordant due to polyphyly (gDFP; see Mihn et al. [51]) in our concatenated topology in IQ-TREE 2 using the options-gcf and-df-tree. We annotated our species tree using the option -t 16 in ASTRAL-III to calculate the NQS of the main topology (preferred species tree) and its branches and the two alternative topologies. For both of these analyses we focused on the recovered relationships of trans-Andean species of Hyphessobrycon within Stethaprioninae (monophyly vs non-monophyly). Furthermore, we investigated the recovered relationships between species of trans-Andean Hyphessobrycon.

Investigation of cryptic diversity
We investigated if there is cryptic diversity within species of trans-Andean Hyphessobrycon by leveraging species delimitation methods that do not rely on prior assignment of samples to species or lineages. Instead of delimiting species our main goal was to use these approaches to explore if there are genetic discontinuities across currently recognized species of trans-Andean Hyphessobrycon. We generated a reduced UCE dataset (36 samples; S1 Table) with no missing data (i.e., all UCEs are present in all samples) that we analyzed using two different methods: DISSECT/STACEY [62,63] and SODA [64].
We discarded 10% of the sampled trees as burn-in and summarized the posterior distribution of sampled trees into a "species or minimal clusters tree" (SMC-Tree) [63] in TreeAnnotator v.1.8.0 [66]. The probabilities of two samples belonging to the same minimal cluster of individuals were visualized in R (version 3.6.1) [67] using the function plot.simmatrix of the R scrip plot.simmatrix.R (https://github.com/scrameri/smtools/tree/master/SpeciesDelimitation) that generates a probability matrix of assignment of samples and we used the SMC-Tree as a guide to visualize the output of the plot.simmatrix script.
'Species bOundary Delimitation using Astral' (SODA) [64] is a multi-locus species delimitation method based on quartet frequencies implemented in ASTRAL-III. SODA tests the null hypothesis that a branch has a length of zero, and if the null hypothesis is not rejected, SODA collapses the branch subtending two samples into a "cluster species." Branches that reject the null hypothesis (i.e., positive branch length) are then used to delimit species [64]. SODA makes two assumptions: 1) there is no gene flow and 2) no population structure; it is worth noting that if these assumptions are violated, SODA can potentially delimit species inaccurately (e.g., over split population-level structure) [64]. We generated gene trees for the reduced dataset in IQTREE-2 and we used the SMC-Tree inferred in BEAST during the STACEY analysis as a guide tree. Both gene trees and the SMC-Tree were used as input for the SODA analysis.

Biogeographic analyses
Middle America [68,69] represents one of the most geologically complex regions on the planet, composed of island-like geological blocks (viz., Maya, Chortis, Chorotega, and Chocó, Fig 1) [ [69][70][71] that, while presently connected, have undergone multiple episodes of isolation and connectivity at different geological times [72][73][74]. This variation in connectivity among geological blocks likely played a role creating suitable environments at different time periods that allowed the colonization of Middle America from South America by freshwater fishes. Ostariophysan lineages (e.g., tetras and catfishes) [75] were able to colonize Middle America only after the initial closure of the Isthmus of Panama (~20 million years ago) [76][77][78].
We investigated patterns of colonization of trans-Andean Hyphessobrycon in Middle America [68,69] using geological blocks (viz., Maya, Chortis, Chorotega, and Chocó, Fig 1) [ [69][70][71] as biogeographic units. We downloaded occurrence records of Hyphessobrycon in Middle America from the global biodiversity information facility (GBIF; www.gbif.org) [79]. We plotted these records after removing those that fell outside the known range of the genus in the region. We generated "distributional" maps for all valid species of trans-Andean Hyphessobrycon (Fig 1). The species distribution maps were generated by merging major river basins in between the two most distant occurrence records for each species in QGIS 3.4 [80]. The data for digital elevation model used was elev 30s from WorldClim [81]. River basin boundaries follow those of HydroATLAS [82,83].
Due to the lack of fossils or geological constraints that will allow us to confidently date our phylogenomic hypothesis, we transformed our maximum likelihood tree (75% complete data matrix) into a chronogram that represents relative divergences time estimates instead of absolute ages. To generate the chronogram, we used the chronos function in the R package APE [84,85]. Then we pruned our chronogram using the function drop.tip from the package APE [84,85] to include only one tip per species or lineage of trans-Andean Hyphessobrycon and the species inferred as their sister South American group in our phylogenomic analyses. The pruned tree was used for downstream biogeographic analyses.
We estimated ancestral ranges in BioGeoBEARS [86] implemented in the software RASP v.4 [87]. We mapped the distribution of each species in each geological block onto the pruned tree that we used as input for RASP. Hyphessobrycon occurs exclusively in freshwater systems and there is no evidence that it can tolerate salinity. Thus, we assume that dispersal to new areas (biogeographic units) is constrained to existing freshwater connections during geologic history. River capture [88] and river anastomosis (e.g., Dias et al. [89]) are the two main processes invoked to explain dispersal (species level) or geo-dispersal (assemblage level) of freshwater taxa across adjacent basins [2]. Therefore, our biogeographic inference was constrained to only allow connections (i.e., dispersal) between adjacent biogeographic units (i.e., geological blocks; Fig 1). Based on the present-day disjunct distributions of all species of trans-Andean Hyphessobrycon (Fig 1), we constrained our analyses to allow only a maximum of two geological blocks in which one species can be distributed at a time during their evolutionary history. We compared the fit of three different parametric biogeographic models to our data based on the Akaike Information Criterion: Dispersal-Vicariance (DIVA-Like) [86,90] Dispersal-Extinction-Cladogenesis (DEC-Like) [91] and BAYAREA-Like [92]. All the models were compared with and without the "jump-dispersal" (founder-event speciation) parameter (+J) [86].

Alignments summary
Phylogenomic inference. We assembled clean reads from 200 samples (see Methods) into contigs and identified a total of 2518 UCE loci with a mean length of 443.50 bp (SD = 217; min = 37,max = 1926 bp). The 75% complete data matrix contained 1682 UCEs, the 90% complete contained 1258 UCEs, and the 95% complete contained 838 UCEs. After assembly, alignment, and trimming the average alignment length for the 75% complete matrix was 499. For the reduced dataset, we assembled clean reads of 36 samples of trans-Andean Hyphessobrycon (S1 Table)

Phylogenomic inference
Different levels of missing data did not seem to have an effect on our concatenated phylogenomic inference. Overall, relationships were congruent among the three datasets (i.e., 75%, 90%, and 95% complete matrices; S1 Fig) and the majority of nodes were recovered with high support (UFBoot2 = 100, SH-aLRT = 100). Therefore, we conducted our coalescent, gene concordance factors, normalized quartet scores, and biogeographic analyses based on the gene trees and topology (concatenated and coalescent) inferred using the data matrix with a larger number of loci (75% complete data matrix). We rooted all topologies with the branch subtending to the sub-family Spintherobolinae (Figs 2 and S1) following previous phylogenomic hypotheses of the family Characidae [9].

Investigation of cryptic diversity
The species tree inferred from the reduced dataset of 36 samples and no missing data supported the same topology of trans-Andean Hyphessobrycon species as the concatenated and coalescent trees of the complete data set (Figs 2 and S1). Both species delimitation methods suggested cryptic diversity within trans-Andean Hyphessobrycon (Fig 4). In the STACEY analysis, we investigated alternative clustering schemes after summarizing 70% of the distribution of post-burnin sampled trees. We identified three clustering schemes: the "optimal" clustering scheme recovered nine lineages within the trans-Andean Hyphessobrycon and was observed in 48% of the sampled trees, the second clustering scheme recovered 10 lineages and was observed in 15% of the sampled trees, and the third clustering scheme recovered 11 lineages and was observed in 9% of the sampled trees (Fig 4). In contrast, in the SODA analysis, we identified 15 lineages within the samples of trans-Andean Hyphessobrycon analyzed (Fig 4).
Both STACEY and SODA congruently uncovered cryptic lineages within samples of H. compressus and H. condotensis (Fig 4). It is worth noting that the optimal clustering scheme in STACEY (i.e., 0.48) grouped all samples of H. columbianus and H. "condotensis" into a single cluster, in contrast to the two minority schemes (i.e., 0.15 and 0.09) which recovered H.  S1 Table). The vertical bars to the right represent the output of the species delimitation analysesnumber of lineages or clusters and assignment of individuals to each one of them in SODA and STACEY. STACEY 0.48 is the "optimal" number of clusters, STACEY 0.15 is the 2 nd alternative number of clusters, and STACEY 0.09 is the 3 rd alternative number of clusters recovered. The color scheme of the bars corresponds to the color scheme in Fig 1. Illustration  columbianus and H. "condotensis" as independent clusters (Fig 4). For the optimal number of clusters, the probabilities that all samples belong to their assigned "species cluster" were high and ranged from p = 0.79-1.0 (Fig 5), with the exception of H. "condotensis" and H. columbianus that showed a probability of p = 0.55 for belonging to the same cluster (Fig 5). The SODA analysis further suggested that there is cryptic diversity within H. "compressus", H. tortuguerae, H. bussingi, H. columbianus, and H. condotensis (Fig 4).

Biogeographic analyses
We identified the DEC-like model as the best fit biogeographical model for our dataset ( Table 1). The DEC-like model was the best fit model in both model selection analyses, with and without the +J parameter (Tables 1 and S1). Following a recent critique to the usage of the +J parameter (see Ree & Sanmartín [93]) we present our results based on the analysis without the + J parameter (Table 1).
Based on the DEC-like model, we inferred a wide ancestral range east of the Andes and in the Chocó block (AB, Fig 6) for the most recent common ancestor (MRCA) of trans-Andean Hyphessobrycon and its sister cis-Andean group (node 25 in Fig 6). We inferred six dispersal events and three vicariance events in the clade of trans-Andean Hyphessobrycon. At crown node 25, we inferred a dispersal event into the Chorotega block and a vicariant event that led to the MRCA of all trans-Andean Hyphessobrycon (Fig 6). The most likely estimated ancestral range of the MRCA of trans-Andean Hyphessobrycon included the Chocó and Chorotega blocks (node 22, Fig 6). At node 22, we inferred a vicariant event that led to the divergence of the Chocó species group and the Central American species of Hyphessobrycon. The most likely estimated ancestral ranges of the MRCA of the Chocó species group (node 21) and Central American Hyphessobrycon (node 18) were the Chocó and Chorotega blocks, respectively (Fig  6). At node 18, we inferred a dispersal event into the Chortis block and a vicariant event that led to the split of the 'panamensis' species group and the 'compressus' species group. The most likely ancestral range of the MRCA of the 'panamensis' group (node 17) was the Chorotega, and the Chortis block for the 'compressus' species group (node 15) (Fig 6). Subsequently, the MRCA of the 'compressus' species group dispersed into the Maya block (node 15; Fig 6). Finally, we inferred three dispersal events within the Chorotega block, one at node 16, and two at node 14 (S2 Table).

Discussion
We present the first and most complete phylogenomic hypothesis of trans-Andean Hyphessobrycon (including H. compressus). Our analyses overwhelmingly support the monophyly of trans-Andean species of Hyphessobrycon, and we find a complete lack of evidence supporting the type species of the genus, H. compressus, as part of the cis-Andean 'rosy tetra' clade (Figs 2 and 3 and S1). In agreement with the hypothesis and discussion of García-Alzate et al. [33], the phylogenetic placement of H. compressus warrants the trans-Andean species to be treated as Hyphessobrycon sensu stricto, whereas all cis-Andean species should be treated tentatively as 'Hyphessobrycon' incertae sedis until a) it can be tested if they belong to Hyphessobrycon sensu stricto or b) assigned to different genera.

Phylogenetic relationships
Concatenated and coalescent analyses of 1682 UCE loci are concordant with previous hypotheses that species currently recognized as Hyphessobrycon are polyphyletic (Fig 2; e.g., [14,17,31]). We recovered strong support for the monophyly of the trans-Andean species of Hyphessobrycon (Figs 2 and 3 and S1). A clade comprised of Macropsobrycon xinguensis, Hemigrammus levis, and Moenkhausia gracilima all distributed in the Amazon River basin, was inferred as sister to the trans-Andean Hyphessobrycon clade (Figs 2 and 3). Despite the strong support for this sister relationship with trans-Andean Hyphessobrycon (LPP = 1, UFBoot2 = 100, SH-aLRT = 100; Figs 2 and S1), an alternative hypothesis (i.e., H. diancistrus + trans-Andean Hyphessobrycon) was observed in roughly equal frequency (S2 Fig). ., H. bayleyi, H. diancistrus, and H. otrynus) from the Amazon basin has been suggested based on unpublished molecular data, see Lima et al. [94]. Our phylogenomic hypothesis unambiguously placed H. compressus in a clade exclusively composed of trans-Andean LnL = log-likelihood score for the model. k = number of parameters, d = dispersal rate, e = extinction rate or range loss rate AICc = Akaike information criterion corrected. AICc wt = Akaike information criterion corrected weighted. The best biogeographic model was selected based on the highest AICc wt, in bold. https://doi.org/10.1371/journal.pone.0279924.t001

PLOS ONE
Hyphessobrycon (Figs 2 and 3 and S1) and challenges previous hypotheses that suggest that H. compressus is closely allied with cis-Andean taxa [14,31,32,94]. Undoubtedly, future work that includes more taxa of the subfamily Stethaprioninae will enable us to more confidently place the trans-Andean Hyphessobrycon within the most diverse subfamily of Characidae [21].

Phylogenetic relationships and cryptic diversity of trans-Andean Hyphessobrycon
Within the trans-Andean Hyphessobrycon, our results support the recognition of three subclades: the Chocó species group, the 'panamensis' species group, and the 'compressus' species group. Our analysis recovered the Chocó species group sister to a clade composed of the 'panamensis' + 'compressus' species groups (Fig 3A). Furthermore, our species delimitation approaches uncovered cryptic diversity in two species of the Chocó and the 'compressus' species groups: H. condotensis and H. compressus, respectively (Figs 4 and 5). Furthermore, we interpret the inferred cryptic diversity within H. "compressus", H. tortuguerae, H. bussingi, and H. columbianus by the SODA analysis (Fig 4) as population-level structure (see Methods). The Chocó species group, which is restricted to aquatic systems in the Chocó block in northwestern South America and eastern Panama (Fig 1) (Figs 4 and 5). It is worth noting that H. "condotensis" is closely related to H. columbianus, and both lineages are distributed in aquatic systems that drain to the Atlantic slope of northwestern South America and eastern Panama. In contrast, samples assigned to H. condotensis were collected in the San Juan and Baudo rivers which drain into the Pacific slope of Colombia (Fig 1). Despite the lack of samples of H. daguae in our study, we hypothesize that this species belongs to the Chocó species group because of the observed correlation of phylogeny (i.e., species groups) with geographic distributions (i.e., geological blocks). Furthermore, close relationships of H. daguae with species of Hyphessobrycon in northwestern South America has been proposed based on morphological characters, see Ota et al. [26]. The taxonomy of Hyphessobrycon species distributed in the Chocó block has remained in flux [26,33,95] and the presence of cryptic diversity has been suggested [96]. Our work uncovered cryptic diversity in the region (Figs 4 and 5) and disagrees with the current taxonomic hypothesis of four valid species of Hyphessobrycon (i.e., H. columbianus, H. condotensis, H. daguae, and H. ecuadoriensis) distributed in the Chocó block (see Ota et al. [26]). For example, the species H. sebastiani was proposed as a junior synonym of H. condotensis based on morphological and coloration characters [26]. Our robust phylogenomic hypothesis (Figs 2 and 3 and S1) coupled with the investigation of cryptic diversity (Figs 4 and 5) do not support the recognition of H. condotensis as a single widespread species in the Pacific and Atlantic slope of Colombia (Fig 1) proposed by Ota et al. [26]. We propose two hypotheses for the identity of the cryptic H. "condotensis" lineage: a) it belongs to the previously recognized H. sebastiani, which has been reported to be distributed in the Upper Atrato basin [33] or b) it represents a newly recovered lineage of the Chocó species group. Further work is needed to clarify the taxonomic identity of H. "condotensis" in the Upper Atrato River. Finally, despite the fact that our coalescent and concatenation analyses both recover the monophyly of the Chocó species group (Figs 2 and S1), gCF and NQS suggest that the placement of H. ecuadoriensis is not unambiguously resolved. Specifically, we observed a similar proportion of gene trees that support recognition of the Chocó species group excluding H. ecuadoriensis. This hypothesis resolved H. ecuadoriensis as the sister to all trans-Andean Hyphessobrycon (T2; Fig 3D). Future work that includes more robust sampling is needed to better understand and recognize the diversity of Hyphessobrycon in the Chocó block and will help to clarify the phylogenetic relationships among this group and within trans-Andean Hyphessobrycon.
The 'panamensis' species group is comprised of three species, H. savagei, H. panamensis, and the recently described H. bussingi [26]. These three species are mainly distributed in the Pacific (H. savagei) and Atlantic (H. bussingi and H. panamensis) slopes of the Chorotega block [71]. The sister relationship of the two Atlantic species relative to the Pacific one suggests a possible vicariant role of the orogenic process giving rise to the Talamanca mountain range [97] that split the MRCA of the 'panamensis' species group into two Pacific and Atlantic lineages. Following divergence, speciation in situ may have occurred within the Atlantic slope of the Chorotega block. The 'compressus' species group is comprised of three lineages: H. compressus, H. "compressus", and H. tortuguerae. The recovered close relationships between these three lineages support the hypothesis of Böhlke [30], who allies H. compressus, H. tortuguerae, and H. milleri (currently synonymized under H. compressus, see below) regardless of differences in coloration among other characters [30]. The uncovering of cryptic diversity within H. compressus (Fig 1) is remarkable considering a recent and thorough systematic work that redescribed H. compressus and recognized it as a widespread species (Fig 1) across Mexico, Guatemala and Belize "without detected variations in meristic and morphometric data" [31]. These results led to the recognition of H. milleri as junior synonym of H. compressus [31]. Interestingly, our sampling across the distribution of H. compressus (Fig 1) mirrors the distribution of specimens analyzed in the previous morphological study (see fig. 7 of Carvalho & Malabarba [31]), which highlights the conserved morphology of some lineages of trans-Andean Hyphessobrycon (Figs 4 and 5). We propose two plausible hypotheses for the identity of the cryptic H. "compressus" lineage a) it belongs to the previously recognized H. milleri, hence extending its distribution into southern and northern Belize, or b) it represents a newly uncovered lineage of the 'compressus' species group. Specimens of Hyphessobrycon in the Motagua River basin, where the type locality of H. milleri is located, have not been collected since 1974 despite recent collecting efforts by the authors (DJE and CDM). The inclusion of samples from the Motagua basin for molecular analyses would be key for testing our proposed hypotheses regarding the identity of the H. "compressus" lineage.
Finally, the relationships recovered among the three species groups, (Chocó ('panamensis', 'compressus')), do not support the recent hypothesis of relationships for some species of trans-Andean Hyphessobrycon that proposed the 'H. panamensis species-group' sensu Ota et al. [26] is comprised of H. bussingi, H. columbianus, H. condotensis, H. daguae, H. panamensis, and H. savagei. Coloration patterns alone or in combination with morphological characters have been used to propose relationships among species of trans-Andean Hyphessobrycon. For example, it has been suggested that pigmentation (or lack thereof) on the dorsal fin reflects relationships among species of trans-Andean Hyphessobrycon including the 'H. panamensis species-group' [26,31,32]. However, our robust phylogenomic hypotheses do not support the monophyly of the 'H. panamensis species-group' sensu Ota et al. [26] and cast doubt on the use of these coloration patterns as phylogenetically informative characters for trans-Andean Hyphessobrycon. Finally, it is worth noting that for the Central American species of Hyphessobrycon (i.e. H. compressus, H. tortuguerae, H. savagei, H. milleri, and H. panamensis), three morphological synapomorphies that unite these species have been proposed: a) premaxilla with seven teeth in the inner row, b) round foramen in the ventral region of the quadrate, and c) two foramina in the ventral margin of the epiotic bone, see García-Alzate et al. [33]. Future work is necessary to test if these synapomorphies are diagnostic of Hyphessobrycon sensu stricto.

Colonization of Middle America
The best-fit biogeographical model first inferred a vicariant event separating sister cis-Andean taxa and trans-Andean Hyphessobrycon (node 25; Fig 6). This event was plausibly promoted by orogenic processes in northwestern South America, like the uplift of the Andes during the Miocene [98,99], that played an important role shaping the riverscape that promoted the origin of trans-Andean ichthyofauna [78,[100][101][102].
Our results support a single colonization event of Middle America by Hyphessobrycon. The use of geological blocks as biogeographic units allowed us to uncover a stepwise pattern of colonization from South (Chocó block) to North (Maya block), followed by in-situ diversification within each geological block (Fig 6). Our ancestral range estimation inferred that the MRCA of all trans-Andean Hyphessobrycon was distributed in an area comprising the Chocó and Chorotega blocks (node 22; Fig 6). In this region, another vicariant event led to the separation of the Chocó and 'panamensis' species groups. Our biogeographical inferences together with the present-day distribution of the Chocó and 'panamensis' species groups agree with the early colonization model of ostariophysan fishes of Bermingham and Martin [103]. This model suggests an early colonization into Central America (Chorotega block) during the late Miocene, followed by extinction (in some taxa) in some river basins due to changes in sea levels during the Pliocene [77,78,100,[102][103][104][105]. Furthermore, our current understanding of the complex landscape in lower Middle America and the timing of the initial closure of the Isthmus of Panama (~20 million years ago) [76,106] provides evidence of connectivity between the Chocó and Chorotega blocks during the Miocene.
From the Chorotega block, we inferred a range expansion into the Chortis block followed by a vicariant event that separated the 'panamensis' and 'compressus' species groups. The range fragmentation of the MRCA of the Central American species of Hyphessobrycon may be explained by a marine corridor across the Nicaragua depression, which was located between the Chorotega and Chortis blocks until the Pliocene [72]. Global sea level changes [107] have impacted the landscape of Middle America (see Bagley and Johnson [108] and references therein). Due to physiological constraints of Hyphessobrycon (i.e., salinity intolerance), a marine corridor likely extirpated ancestral populations and limited gene flow between populations on either side of the barrier that gave origin to the 'panamensis' and 'compressus' groups. Finally, present-day disjunct distributions between species in the 'compressus' group (Fig 1) preclude us from making inferences regarding the colonization route from the Chortis into the Maya block.

Conclusions
Historically, it has been challenging to resolve the phylogenetic relationships in some lineages of the highly diverse Characidae, in part due to their conserved morphologies [14,32,109] or homoplasy of phylogenetic characters [32,78,110]. Our robust phylogenomic hypothesis resolves the placement of H. compressus together with the other trans-Andean species, instead of with the cis-Andean species of the 'rosy tetra' clade where it had been historically grouped [31,32]. Our hypothesis supports that Hyphessobrycon sensu stricto should be restricted to trans-Andean taxa. Tentatively cis-Andean species of Hyphessobrycon should be treated as 'Hyphessobrycon' incertae sedis until further work can help to elucidate to which genera they are closely allied. Future work that incorporates more taxa within Stethaprioninae will help us to more confidently place Hyphessobrycon sensu stricto within this subfamily.
Our results bring into question the utility of coloration patterns as phylogenetically informative characters for trans-Andean Hyphessobrycon. The unveiling of cryptic diversity highlights the conserved morphology among divergent lineages of trans-Andean Hyphessobrycon. Finally, we propose an interesting biogeographic model for the colonization of Middle America by ostariophysan fishes, in which the ancestral lineages first colonized the geological blocks in a stepwise fashion, and after colonization, in-situ speciation took place within each geological block.

S1 Fig. Phylogenomic relationships of trans-Andean Hyphessobrycon based on concatenated analysis of ultraconserved elements.
A) Inferred phylogeny based on the 75% complete data matrix B) inferred phylogeny based on the 90% complete data matrix, and C) inferred phylogeny based on the 75% complete data matrix. All nodes are supported with ultrafast bootstrap (UFBoot2) = 100 and SH-like approximate ratio test (SH-aLRT) = 100 unless noted. Nodes with gray circles UFBoot2 < 90 and SH-aLRT < 90. Species names with asterisk indicates samples from Melo et al. [9]. (PDF)  2). B) trans-Andean Hyphessobrycon sister to H. diancistrus. C) trans-Andean Hyphessobrycon sister other characins of the subfamily Stethaprioninae. Number of trees decisive for the branch (gN) = 1646, effective number of genes for a branch of interest (EN) = 1220.72. gCF = gene concordance factor, gDF1 = gene discordance factor for NNI-1 branch, gDF2 = gene discordance factor for NNI-2 branch, gene discordance factor due to polyphyly (gDFP) = 65.37. NQS = normalized quartet score. (PDF) S1 Table. List of all samples included in this study and summary statistics of the sequencing output recovered for each one of them. Samples in bold were used for the cryptic diversity analyses. Museum codes follow Sabaj (2020). Newly generated sequence data is archived in the Sequence Archive Repository (SRA) BioProject PRJNA887072. (XLSX) S2 Table. Parameters estimates of six different biogeographic models. LnL = log-likelihood score for the model. k = number of parameters, d = dispersal rate, e = extinction rate or range loss rate, j = jum-dispersal rate, AICc = Akaike information criterion corrected. AICc wt = Akaike information criterion corrected weighted. The best biogeographic model was selected based on the highest AICc wt, in bold. (PDF)